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^ Abstract 

d The paper considers general multiplicative models for complete and 

c/3 incomplete contingency tables that generalize log-linear and several 

other models and are entirely coordinate free. Sufficient conditions of 
the existence of maximum likelihood estimates under these models are 
given, and it is shown that the usual equivalence between multinomial 
and Poisson likelihoods holds if and only if an overall effect is present 
CO in the model. If such an effect is not assumed, the model becomes 

^ a curved exponential family and a related mixed parameterization is 

^ given that relies on non-homogeneous odds ratios. Several examples 

are presented to illustrate the properties and use of such models. 
y—i Keywords: Contingency tables, curved exponential family, expo- 

^ nential family, generalized odds ratios, maximum likelihood estimate, 

^ multiplicative model 



Introduction 



The main objective of the paper is to develop a new class of models for the set 
of all strictly positive distributions on contingency tables and on some sets 
of cells that have a more general structure. The proposed relational models 
are motivated by traditional log-linear models, quasi models and some other 
multiplicative models for discrete distributions that have been discussed in 
the literature. 
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Under log-linear models (Bishop et al. , 1975), cell probabilities are deter- 
mined by multiplicative effects associated with various subsets of the vari- 
ables in the contingency table. However, some cells may have other charac- 
teristics in common, and there always has been interest in models that also 
allow for multiplicative effects that are associated with those characteris- 
tics. Examples, among others, include quasi models (Goodman, 1968, 1972), 



topological models (Hauser 1978 Hout 1983), indicator models (Zelter 



man & Youn, 1992), rater agreement-disagreement models (Tanner &; Young 
1985a|bD , two-way subtable sum models (iHara et al. 1 120091). All these models. 



applied in different contexts, have one common idea behind them. A model 
is generated by a class of subsets of cells, some of which may not be induced 
by marginals of the table, and, under the model, every cell probability is the 
product of effects associated with subsets the cell belongs to. This idea is 
generalized in the relational models framework. 

The outline of the paper is as follows. The definition of a table and 
the definition of a relational model generated by a class of subsets of cells 
in the table are given in Section [T} The cells are characterized by strictly 
positive parameters (probabilities or intensities); a table is a structured set 
of cells. Under the model, the parameter of each cell is the product of effects 
associated with the subsets in the generating class, to which the cell belongs. 
Two examples are given to illustrate this definition. Example 1.1| shows 



how traditional log-linear models fit into the framework and Example 1.2 



describes how multiplicative models for incomplete contingency tables are 
handled. 

The degrees of freedom and the dual representation of relational models 
are discussed in Section [2j Every relational model can be stated in terms 
of generalized odds ratios. The minimal number of generalized odds ratios 
required to specify the model is equal to the number of degrees of freedom 
in this model. 

The models for probabilities that include the overall effect and all rela- 
tional models for intensities are regular exponential families. Under known 
conditions (cf. Barndorff-Nielsen, 1978), the maximum likelihood estimates 
for cell frequencies exist and are unique; the observed values of canonical 
statistics are equal to their expected values. If the overall effect is not present, 
a relational model for probabilities forms a curved exponential family. The 
maximum likelihood estimates in the curved case exist and are unique under 
the same condition as for regular families; the observed values of canonical 
statistics are proportional to their expected values. The maximum likelihood 
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estimates for cell frequencies under a model for intensities and under a model 
for probabilities, when the model matrix is the same, are equal if and only 
if the model for probabilities is a regular family. These facts are proved in 
Section HI 

A mixed parameterization of finite discrete exponential families is dis- 
cussed in Section |4} Any relational model is naturally defined under this 
parameterization: the corresponding generalized odds ratios are fixed and 
the model is parameterized by remaining mean-value parameters. The dis- 
tributions of observed values of subset sums and generalized odds ratios are 
variation independent and, in the regular case, specify the table uniquely. 

Two applications of the framework are presented in Section |5j These 
are the analysis of social mobility data and the analysis of a valued network 
with given attributes. These two examples suggest that the flexibility of 
the framework and substantive interpretation of parameters make relational 
models appealing for many settings. 

1 Definition and Log-linear Representation of 
Relational Models 

Let Yi, . . . , Yk be the discrete random variables modeling certain character- 
istics of the population of interest. Denote the domains of the variables by 
yi,...,yK respectively. A point (yi, ?/2, • • • , e x ■ ■ ■ x 3^;^ generates a 
cell if and only if the outcome {yi, y2, ■ ■ ■ , Vk) appears in the population. A 
cell {yi,y2, • • • , yx) is called empty if the combination is not included in the 
design. 

Let X denote the lexicographically ordered set of non-empty cells in x 
■ ■ ■ X 3^x, and |X| denote the cardinality of X. Since the case, when X = 
3^1 X ■ ■ ■ X y^, corresponds to a classical complete contingency table, then 
the set X is also called a table. 

Depending on the procedure that generates data on X, the population 
may be characterized by cell probabilities or cell intensities. The parameters 
of the true distribution will be denoted by <5 = for i G X}. In the 

case of probabilities, 6{i) = G (0,1), where = Ij the case 

of intensities, 5{i) = X{i) > 0. Let V denote the set of strictly positive 
distributions parameterized by S. 
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Definition 1.1. Let S = {Si, . . . , S'j}, be a class of non-empty subsets of 
the table X, A a J x |X| matrix with entries 

_ f 1, if the i-th cell is in Sj, ^ . _ i--ri i ■ ^ t 

a,, = I,(.) = I otherwise, for z = 1, . . . , |X| and j = 1, . . . , J. 

(1) 

A relational model RM{S) C V with the model matrix A is the subset of V 
satisfying the equation: 

log S = A'/3, (2) 

for some f3 G M"'. 



as 



Under the model ^ the parameters of the distribution can also be written 

J J 

5(z) = exp{J]l,(z)/3,} = n(^.)'^^^ (3) 

i=i j=i 

where 9j = exp {(3j), for j = 1, . . . , J. 

The parameters /3 in (|2]) are called the log-linear parameters. The pa- 
rameters in ([3]) are called the multiplicative parameters. If the subsets in 
S are cylinder sets, the parameters f3 coincide with the parameters of the 
corresponding log-linear model. 

In the case <5 = p it must be assumed that Uj^^Sj = X, i.e. there are 
no zero columns in the matrix A. A zero column implies that one of the 
probabilities is 1 under the model and the model is thus trivial. 

The example below describes a model of conditional independence as a 
relational model. 

Example 1.1. Consider the model of conditional independence [Yi Is] [12^3] 
of three binary variables Yi, Y2, I3, each taking values in {0, 1}. The model 
is expressed as 

_ Pi+kP+jk 
Pijk ) 
P++k 

where pi^k,P+jk,P++k are marginal probabilities in the standard notation 



(Bishop et al. , 1975). Let S be the class consisting of the cylindrical sets 
associated with the empty marginal and the marginals Yi, Y2 Is, YiY^, Y2Y2. 
The model matrix computed from ([T]) is not full row rank and thus the model 
parameters are not identifiable (cf. Section |2]). A full row rank model matrix 
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can be obtain by setting, for instance, the level of each variable as the 
reference level. After that, the model matrix is equal to 



A 



/ 1 








\ 



(4) 



The first row corresponds to the cylindrical set associated with the empty 
marginal. The next three rows correspond to the cylindrical sets generated 
by the level 1 of Fi, 1^2? ^3 respectively. The fifth row corresponds to the 
cylindrical set generated by the level 1 for both Yi and Y^, and the last row 
- to the cylindrical set corresponding to the level 1 for both Y2 and Y3. □ 

In the next example, one of the cells in the Cartesian product of the 
domains of the variables is empty and the sample space I is a proper subset 
of this product. 



Example 1.2. The study described by Kawamura et al. (1995) compared 



three bait types for trapping swimming crabs: fish alone, sugarcane alone, 
and sugarcane-fish combination. During the experiment, catching traps with- 
out bait was not considered. Three Poisson random variables are used to 
model the amount of crabs caught in the three traps. The notation for the 
intensities is shown in Table [TJ The model assuming that there is a multi- 
plicative effect of using both bait types at the same time will be tested in 
this paper. The hypothesis of interest is 



A, 



00 



Am A 



01^10- 



(5) 



The effect can be tested using the relational model for rates on the class 
S consisting of two subsets: S = {5*1, 5*2}, where = {(0, 0), (0, 1)} and 
^2 = {(0,0), (1,0)}: 

log A = A'/3, 



Here, the model matrix 



A 



1 1 
1 1 



and f3 = {Pi-, P2)' ■ The relationship between the two forms of the model will 
be explored in the next section. □ 
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Sugarcane 


Fish 


Yes No 


Yes 
No 


-^00 -^01 
^10 



Table 2: Number of trapped Charybdis 
japonica by bait type. 



Table 3: Number of trapped Portunus- 
pelagicus by bait type. 



Sugarcane 


Fish 


Yes No 


Yes 
No 


36 2 
11 



Sugarcane 


Fish 


Yes No 


Yes 

No 


71 3 
44 



2 Parameterizations and Degrees of Freedom 

A choice of subsets in S = {Si, . . . , Sj} is implied by the statistical problem, 
and the relational model RM{S) can be parameterized with different model 
matrices, which may be useful depending on substantive meaning of the 
model. Sometimes a particular choice of subsets leads to a model matrix 
A with linearly dependent rows and thus non-identifiable model parameters. 
To ensure identifiability, a reparameterization, that is sometimes referred to 
as model matrix coding, is needed. Examples of frequently used codings are 
reference coding, effects coding, orthogonal coding, polynomial coding (cf. 



Christensen 1997). 

Write -R(A) for the row space of A and call it the design space of the 
model. The elements of -R(A) are |I|-dimensional row-vectors and 1 denotes 
the row- vector with all components equal to 1. Reparameterizations of the 
model have form f3 = 0/3^, where f3i are the new parameters of the model 
and C is a J X [rank{A)] matrix such that the modified model matrix C'A 
has a full row rank and R{A) = R{C'A). Then i?(A)^ = R{C'A)^, that is 
Ker{A) = Ker {C'A). 

Let V = Vs = {Ps '■ S G Af} be the set of all positive distributions 
on the table X. Here the parameter space J\f is an open subset of MI^I. 
Suppose 6 C Af. Then the set Vo = {Ps : S E Q C Af} is a model in P^. 
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The number of degrees of freedom of the model Vq is the difference between 
dimensionahties of JV and 6. 

Theorem 2.1. The number of degrees of freedom in a relational model 
RM{S) IS \I\ -dimR{A). 

Proof. Let S = p = (p(l), . . . ,p(|X|)'. Since X]j6X^'(^) ~ then the pa- 
rameter space Af is |X| — 1-dimensionaL If RM{S) is a relational model for 
probabilities ([3]), its multiplicative parameters 6 must satisfy the normalizing 
equation 

j]n(e,)^^«=i. (6) 

i&X j=l 

Since the model matrix is full row rank, then the set = {0 G M:^ : 
Xliex n/=i(^i)^^''*'' = 1} is a J — 1-dimensional surface in M'^. Therefore, the 
number of degrees of freedom of RM{S) is dimM —dimQ = |X| — 1 — (J — 1) = 
|J| — dimR{A). 

Let <5 = A and RM{S) is a model for intensities. In this case. A/" = {A G 
m'J'} and C A/" consists of all A satisfying (3). Since no normalization 
is needed, diniM = |X| and dimO = dimR{A) and thence the number of 
degrees of freedom of RM{S) is equal to |X| — dimR{A). □ 

The theorem implies that the number of degrees of freedom of the rela- 
tional model coincides with dim Ker{A). This is in coherence with the fact 
that the kernel of the model matrix is invariant of reparameterizations of the 
model ([2]). To restrict further analysis to models with a positive number of 
degrees of freedom suppose in the sequel that Ker{A) is non-trivial. Without 
loss of generality, suppose further that the model matrix is full row rank. 

Definition 2.1. A matrix D with rows that form a basis of Ker{A) is called 
a kernel basis matrix of the relational model RM{S). 

The representation ([2]) is a primal (intuitive) representation of relational 
models; a dual representation is described in the following theorem. 

Theorem 2.2. (i) The distribution, parameterized by 6, belongs to the re- 
lational model RM{S) if and only if 



Blog 6 = 0. 



(7) 
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(a) The matrix D may be chosen to have integer entries. 

Proof. (i) By the definition of a relational model, 

Ps e RM{S) ^ log (5 = A'/S. 

The orthogonality of the design space and the null space implies that 
AD' = for any kernel basis matrix D. The rows of D are linearly 
independent. Therefore 

Ps G RM{S) ^ Dlog 5 = DA'/3 = 0. 



(ii) Since A has full row rank, then the dimension of Ker (A) is equal to 



J. 



By Corollary 4.3b (Schrijver, 1986, pg. 49), there exists a unimodular 
matrix U, i.e. U is integer and detlJ = ±1, such that AU is the 
Hermite normal form of A, that is 

(a) AU has form [B,0], 

(b) B is a non-negative, non-singular, lower triangular matrix; 

(c) AU is an n X m matrix with entries Cij such that Cjj < Cu for all 
i = l,...,n, j = l,...,m, i^j. 

Let Iko stand for the Kq x Kq identity matrix, denote the J x Kq 
zero matrix, and Z be the following |X| x Kq matrix: 







Since the matrix AU has form [B, 0] where B is the nonsingular, lower 
triangular, J x J matrix, then (AU)Z = 0. 

Set D' = UZ. Then 

AD' = AUZ = 0. (8) 

The matrix U is integer and nonsingular, the columns of Z are linearly 
independent. Therefore the matrix D' is integer and has linearly inde- 
pendent columns. Hence the matrix D is an integer kernel basis matrix 
of the model. □ 
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Example 1.1 (Revisited) For the model of conditional independence 
dim Ker{A) = 2. If the kernel basis matrix is chosen as 



D 



10-1 -1 10 
1 -1 -10 1 



the equation Dlog p = is equivalent to the following constraints: 

PoooPiio ^ ^ PooiPiii ^ ^ 
PoioPioo ' PoiiPioi 

The latter is a well-known representation of the model [Yi la] [12^3] in terms 
of the conditional odds ratios (Bishop et al. , 1975). □ 
The dual representation ([7|) of a relational model is, in fact, a model 
representation in terms of some monomials in 5. All types of polynomial 
expressions that may arise in the dual representation of a relational model 
are captured by the following definition. 

Definition 2.2. Let u{i),v{i) G Z>o for all i e I, S"" = Uiex^i^T^'^ and 
6^ = Yliex^i^y^^^ ■ ^ generalized odds ratio for a positive distribution, pa- 
rameterized by 5, is a ratio of two monomials: 

on = 5"/(5". (9) 



The odds ratio OTZ = |^ is called homogeneous if ^jgjwl^) = 
To express a relational model RM{S) in terms of generalized odds ratios, 

write the rows di, d2, . . . , dj^o £ Z''^' of a kernel basis matrix D in terms of 

their positive and negative parts: 

di = dl - d;", 

where d'l, dj > for all / = 1, 2, ... , Kq. Then the model ^ takes form 

d^log 8 = log 6, for / = 1, 2, . . . , Kq, 

which is equivalent to the model representation in terms of generalized odds 
ratios: 

^d+z^dr _ for / = 1, 2, . . . , Ko. (10) 

The number of degrees of freedom is equal to the minimal number of gener- 
alized odds ratios required to uniquely specify a relational model. 
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Example 1.2 (Revisited) The model Aqo = AqiAiq can be expressed 
in the matrix form as: 

Dlog A = 0, (11) 

where D = (1,-1,-1). The matrix D is a kernel basis matrix of the re- 
lational model, as one would expect. Finally, the model representation in 
terms of generahzed odds ratios is 



Aqo 

AoiAio 



1. 



□ 



The role of generalized odds ratios in parameterizing distributions in V 
will be explored in Section |4j 



3 Relational Models as Exponential Families: 
Poisson vs Multinomial Sampling 

The representation (|3| implies that a relational model is an exponential fam- 
ily of distributions. The canonical parameters of a relational model are /3j's 
and the canonical statistics are indicators of subsets Ij. Relational models 
for intensities and relational models for probabilities are considered in this 
section in more detail. 

Let RM\{S) denote a relational model for intensities and RMp{S) denote 
a relational model for probabilities with the same model matrix A, that has 
a full rank J. 

If the distribution of a random vector Y is parameterized by intensities 
A, then, under the model RMx{S), 

P{Y = y)= ^ exp {(3' Ay - lexpA'/3}. (12) 

If the distribution of Y is multinomial, with parameters N and p, then, 
under the model i?Mp(S), 
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Set 



T{Y) =AY = [TiiY), r2iY), TAY))'. 



(14) 



For each j G 1, . . . , J, the statistic Tj(Y) = X]t6X-'-i(^)^(^) subset sum 

corresponding to the subset Sj. 

Theorem 3.1. A model RM\{S) is a regular exponential family of order J. 
Proof. The model matrix A has full rank; no normalization is needed for 



intensities. Therefore, the representation (12) is minimal and the exponential 
family is regular, of order J. □ 

Relational models for probabilities may have a more complex structure 
than relational models for intensities and, in some cases, become curved 



exponential families (Efron 1975; Brown 1988; Kass & Vos, 1997) 



Theorem 3.2. IflE R{A), a model RMp{S) is a regular exponential family 
of order J — I; otherwise, it is a curved exponential family of order J — 1. 



Proof. Suppose that 1 G i?(A). Without loss of generality, X = Si G S and 
thus 

P{Y = y) = ^ ^L., exp {iV/3i + X^(^y(z)I,(0)/3,}. (15) 

j=2 iex 



The exponential family representation given by (15) is minimal; the model 
RMp{S) is a regular exponential family of order J — 1. 

If 1 ^ R{A) then, independent of parameterization, the model matrix 
does not include the row of all Is. The normalization is required and thus 
the parameter space is a manifold of the dimension J — 1 in M"^ (see e.g. 



Rudin 1976 p. 229). In this case, RMp{S) is a curved exponential family of 
the order J - 1 (iKass & Vosl fT997|). □ 



If a relational model is a regular exponential family, the maximum likeli- 
hood estimate of the canonical parameter exists if and only if the observed 
value of the canonical statistic is contained in the interior of the convex hull 
of the support of its distribution ( Barndorff-Nielsen & Cox 1994). In this 
case, the MLE is also unique. 

It is well known for log-linear models that, when the total sample size 
is fixed, the kernel of the likelihood is the same for the multinomial and 
Poisson sampling scheme and thus the maximum likelihood estimates of the 
cell frequencies, obtained under either sampling scheme, are equal (see e.g. 
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Bishop et al. , 1975, p. 448). The following theorem is an extension of this 



result. 

Theorem 3.3. Assume that, for a given set of observations, the maximum 
likelihood estimates X, under the model RM\{S), and p, under the model 
RMp{S), exist. The following four conditions are equivalent: 

(A) The MLEs for cell frequencies obtained under either model are the same. 

(B) Vector 1 is in the design space R{A). 

(C) Both models may be defined by homogeneous odds ratios. 

(D) The model for intensities is scale invariant. 



Proof (|A|) ^ (jBj) 

The maximum likelihood estimates for probabilities, under the model 
RMp(S), satisfy the likelihood equations 

Ay = aAp (16) 
Ip = 1. 

Here a is the Lagrange multiplier. 

If 1 G -R(A) then there exists a fc G M"^ such that k'A = 1. Multiplying 
both sides of the first equation in (16) by k' yields a = N and hence 

Ay = NAp. (17) 

The maximum likelihood estimates for intensities, under RM\{S), satisfy 
the likelihood equations 

Ay = AX. (18) 



From the equations (17) and (18): 

A - Np G KerA. 
The latter implies that 1(A — Np) = and N = IX. Therefore 
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and the maximum likelihood estimates for cell frequencies obtained under 
either model are the same: 

y = Np = X. 

0^(0 

Suppose that y = Np = X. Under the model RMx{S) 

log (A) = A'/3i 

for some f3i. On the other hand, under the model RMp{S), 

log (A) = log (Np) = + log ^1' 

for some The condition A'^^ = A'^2+loS can only hold if 1 G -R(A). 
(0^0 

The vector 1 G R{A) if and only if all rows of a kernel basis matrix D 
are orthogonal to 1 and the sum of entries in every row of D is zero. The 
latter is equivalent to the generalized odds ratios obtained from rows of D 
being homogeneous. 



(D) ^ B 



Let t > 0, t 7^ 1. 

Dlog (tA) = ^ log t ■ (Dl') = Dl' = 0, or 1 G R{A). 

□ 

Corollary 3.4. For a given set of observations, the MLEs of the subset 
sums under a model RMp{S) are equal to their observed values if and only if 
1 G i?(A). 

Proof. If 1 G -R(A) the model RMp{S) is a regular exponential family. The 
subset sums are canonical statistics; their MLEs are the same as observed. 

Suppose that the MLEs of the subset sums are equal to their observed 
values. Then A^Ap = NAp and thus p — p G KerA. Since Y^ipii) —p{i)) = 
J2pi'^) ~ = 1 — 1 = 0, vector p — p is orthogonal to 1 and thus 

1 G R{A). □ 

Corollary 3.5. Suppose 1 ^ -R(A). For a given set of observations, the 
MLEs, if exist, of the subset sums under a model RMp{S) are proportional 
to their observed values. 
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Proof. In this case the value of a cannot be found from (16) and one can 
only assert that 



Ay = ^Ay. 



□ 



Example 1.2 illustrates a situation when a relational model for intensities 
is not scale invariant. This model is a curved exponential family. The exis- 
tence and uniqueness of the maximum likelihood estimates in such relational 
models is proved next. 

Theorem 3.6. Let 1^ ~ M (N,p), y be a realization ofY, and RMp(S) be 
a relational model, given 1 ^ -R(A). The maximum likelihood estimate for 
p, under the model RMp{S), exists and unique if and only if T{y) > 0. 

Proof. A point in the canonical parameter space of the model RMp{S) that 
maximizes the log-likelihood subject to the normalization constraint is a 
solution to the optimization problem: 

max /(/3; y), 
s.t. i3ev 



where 
and 



l(f3■,y) = r^iy)(3^ + ■■■ + rJ{y)(3J 



J 

V = {(3eRi: 5^exp{5^I,(z)/3,} -1 = 0}. 

i£l j=l 

The set V is non-empty and is a level set of a convex function. The level sets 
of convex functions are not convex in general. However the sub-level sets of 
convex functions and hence the set 

J 

iex j=i 

are convex. 

The set of maxima of /(/3; y) over the set P< is nonempty and consists of 
a single point if and only if (Bertsekas 2009 Section 3) 

i?x) n R-i = Lj) n 
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Here i?x)< is the recession cone of the set V^, R_i is the recession cone of the 
function — Z, Lx>^ is the hneahty space of T'<, and L^i is the hneahty space 
of 

The recession cone of I?< is the orthant ]R:{, including the origin; the 
hneahty space is Lx)^ = {0}. The hneahty space of the function —I is the 
plane passing through the origin, with the normal T{y); the recession cone of 
— / is the half-space above this plane. The condition Rj)^ HR-i = Lu^nL^i = 
{0} holds if and only if all components of T{y) = (Ti{y), ■ . . ,Tj{y))' are 
positive. 

The function l{f3;y) is linear; its maximum is achieved on V. Therefore 
there exists one and only one f3 which maximizes the likelihood over the 
canonical parameter space and the maximum likelihood estimate for p, under 
the model RMp{S), exists and unique. □ 



Table 4: The MLEs for the Number 
of trapped Charyhdis japonica by bait 
type 



Table 5: The MLEs for the Number 
of trapped Portunuspelagicus by bait 
type. 



Sugarcane 


Fish 


Yes No 


Yes 
No 


35.06 2.94 
11.94 



Sugarcane 


Fish 


Yes No 


Yes 
No 


72.31 1.69 
42.69 



Example 1.2 (Revisited) In this example, the relational model for 
intensities is not scale invariant. The maximum likelihood estimates for the 
cell frequencies exist and are shown in Tables |4] and |5| The observed Pearson's 
statistics are = 0.40 and = 1.07 respectively, on one degree of freedom. 

□ 

The relational models framework deals with models generated by subsets 
of cells, and the model matrix for a relational model is an indicator matrix 



that has only 0-1 entries. Theorems 2.2, |3.3| hold if the model matrix has 



non-negative integer entries. The next example illustrates how the techniques 
and theorems apply to some discrete exponential models. 
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Example 3.1. This example, given in (Agresti 2002), describes the study 



carried out to determine if a pneumonia infection has an immunizing effect 
on dairy calves. Within 60 days after birth, the calves were exposed to a 
pneumonia infection. The calves that got the infection were then classified 
according to whether or not they got the secondary infection within two weeks 
after the first infection cleared up. The number of the infected calves is thus 
a random variable with the multinomial distribution M{N, (pii,pi2,P22)')) 
where denotes the total number of calves in the sample. Suppose further 
that pii is the probability to get both the primary and the secondary in- 
fection, pi2 is the probability to get only the primary infection and not the 
secondary one, and P22 is the probability not to catch either the primary or 
the secondary infection. Let < vr < 1 denote the probability to get the 
primary infection. The hypothesis of no immunizing effect of the primary 



infection is expressed as (cf. Agresti[ 2002) 



Pu 



vr 



Pu = 7r(l 



P22 = 1 



vr. 



(19) 



Since the model (19) is also expressed in terms of non-homogeneous odds 
ratios: 

piipi2 ^ . 

then it is a relational model for probabilities, without the overall effect. 

Write Nil, A'12, A'22 for the number of calves in each category and riu, ni2, ^22 
for their realizations. The log-likelihood is proportional to 



(2^11 + ni2)log TT + (ni2 + n22)log (1 - tt). 

The canonical statistic T = (Ti,7^) = (2A^ii + A^i2,A^i2 + A'22) is two- 
dimensional; the canonical parameter space {(log 7r,log (1 — tt)) : tt G (0, 1)} 
is the curve in shown on Figure [l] The model (19) is thus a curved ex- 
ponential family of order 1. 

The likelihood is maximized by 



TT 



2^11 + ni2 



Ti 



2^11 + 2^12 + n22 T1+T2 



where Ti = 2nii + ni2 and T2 = ni2 + ^22 are observed components of the 
canonical statistic, or subset sums. The MLEs of the subset sums can be 
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Figure 1 : The canonical parameter space in Example |3.1 



expressed in terms of their observed values as 



2T2 



+ 



To 



iV(7r(l - tt) + (1 - tt)) = N{ 



T1T2 , T2 



(Ti+T2)2 T1+T2' 



N{2T^ + T2) 
(ri + T2)2 ' 
Ar(2Ti + T2) 



(Ti+T2)2 



Thus, under the model (19), the MLEs of the subset sums differ from their 



observed values by the factor ^^"^^^"p^ . For the data and the MLEs in Table 
|6l this factor is approximately 0.936. □ 



4 Mixed Parameterization of Exponential Fam- 
ilies 

Let Vs be an exponential family formed by all strictly positive distributions 
on X and log 5 be the canonical parameters of this family. Denote by Vj the 
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Table 6: Observed Counts for Primary and Secondary Pneumonia Infection 



of Calves. The MLEs are shown in parentheses (Agresti, 2002). 



Primary Infection 


Secondary Infection 


Yes No 


Yes 
No 


30 (38.1) 63 (39.0) 
63 (78.9 ) 



reparameterization of Vg defined by the following one-to one mapping: 

log S = M'j, (20) 

where M is a full rank, |X| x |X|, integer matrix, and 7 G M'"^'. It was 
shown by Brown ( 1988 ) that Vj is an exponential family with the canonical 
parameters 7. 

Theorem 4.1. The canonical parameters ofV-y are the generalized log odds 
ratios in terms of 5. 



Proof. Since the matrix M is full rank, then 

7 = (M')"Hog 6. 



(21) 



Let B denote the adjoint matrix to M' and write 61, ... , b\x\ for the rows of 
B. The components of 7 can be expressed as: 



1 



7i 



log 5''% for z = 1, 



det(M) 



IXI 



(22) 



All rows of B are integer vectors and thus the components of 7 are multiples 
of the generalized log odds ratios. The common factor l/det(M) 7^ can 
be included in the canonical statistics, and the canonical parameters become 
equal to the generalized log odds ratios. □ 

Let A be a full row rank J x |X| matrix with non-negative integer entries, 
and D denote a kernel basis matrix of A. Set 



M 



A 
D 



(23) 
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find the inverse of M and partition it as 

= [A-,D-] . 

Since DA' = 0, tlien (D~)'A~ = 0. Tliis matrix M can be used to derive 
a mixed parameterization of V witli variation independent parameters (cf. 



Brown, 1988; Hoffmann- J0rgensen , 1994). Under this parameterization 



Ci 

C2 



(24) 



where (^i = A5 (mean-value parameters) and <^2 = D~Iog 5 (canonical pa- 
rameters), and the range of the vector (Ci;C2)' is the Cartesian product of 
the separate ranges of Ci and <^2- 

Another mixed parameterization, which does not require calculating the 
inverse of M, may be obtained as follows. Notice first that for any 6 G 
there exist unique vectors f3 E M'^ and 6 E M'-^'^'^ such that 



By orthogonality. 



log S = A'l3 + 



Dlog 6 = + 

= (DD')^Dlog(5 



(25) 



(26) 



Because of the uniqueness, D~ = (DD')^^D. Moreover, since there is one- 
to-one correspondence between <^2 and <^2 = Dlog 6, then, in the mixed 
parameterization, the parameter <^2 can be replaced with <^2- The components 
of <^2 = Dlog 6 are some generalized log odds ratios as well. 

A relational model is clearly defined and parameterized in the mixed 
parameterization derived from the model matrix of this model. In this pa- 
rameterization the model requires logs of the generalized odds ratios to be 
zero and distributions in this model are parameterized by the remaining 
mean-value parameters. 

The following two examples illustrate the proposed mixed parameteriza- 
tion. 



Example 1.1 (Revisited) Consider a 2 x 2 x 2 contingency table and 
matrices A and D as in Example From (25): 



logp = A'/3 + ^i-(l, 0,-1, 0,-1, 0,1,0)' + ^2 -(0,1, 0,-1, 0,-1,0, TO 
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for some /3 G and 01,02 & M. 

Since the rows of D are mutually orthogonal, then 

(l,0,-l,0,-l,0,l,0)logp = A0U 
(O,l,0,-l,0,-l,0,l)logp = 402. 

Thus, 01 = |log (piiiP22 i)/(Pi2iP2ii) and 02 = |log {PU2P222) / {P122P212) , as 
it is well known (see e.g. [Bishop et aLj[T975| ). 

The parameters jS can be expressed as generalized log odds ratios by 



applying (22): 



/3i 


= lo 




= lo 


/35 


= lo 



PniPi2iP2ii 
P221 

P121P221 

2 2 ' 
P111P211 

2222 
P111P121P212P222 

2 2 2 2- 
P112P122P2I1P22I 



(32 
/36 = 



log 

P1I1P12I 



log 



Pn2Pl22P212P221 

P1I1P121P2IIP222 ' 

2 2 2 2 
1 P1I1P122P2I1P222 
2 2 2 2' 
P112P121P212P22I 



The mean- value parameters for this family are C,i = NAp (the expected 
values of the subset sums). The mixed parameterization consists of the mean- 
value parameters and the canonical parameters <^2 = (^ii ^2)' or = Dlog p. 

□ 

Some models, more general than relational models, can be specified by 
setting generalized odds ratios equal to positive constants. An example of 
such model is given next. 

Example 4.1. The Hardy- Weinberg distribution arising in genetics was dis- 



cussed as an exponential family in Barndorff-Nielsen (1978); Brown (1988), 
among others. Assume that a parent population contains alleles G and g 
with probabilities vr and 1 — vr respectively. The number of genotypes GG, 
Gg, and gg, that appear in a generation of descendants, is a random vari- 
able with M{N,p) distribution. Under the model of random mating and no 
selection, the vector of probabilities p has components 



Pi = Tr , P2 



27r(l 



P3 



'1 



(28) 



The model (28 ) is a one-parameter regular exponential family with the canon- 



ical parameter log This model is slightly more general than relational 
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models, but the techniques used for relational models apply. The model 
representation in terms of homogeneous odds ratios is 



PiP-i 

If the kernel basis matrix is chosen as D 
is 



4. 



(29) 

-1, 2, — 1) and the model matrix 



the model (29) can be expressed as 

Dlog p 



21og 2. 



There exists a mixed parameterization of the family of multinomial dis- 
tributions of the form 

log p = A'/3 + T>'e. (30) 

(/3i,/32)' and e G ~ 



Here /3 



oo, oo). From the equation ( |26| ): 
1, 



e 



.log ^, 

6 PiP3 



The parameter 9 may be interpreted as a measure of the strength of selection 
in favor of the heterozygote character Gg (cf. Brown, 1988). 

The condition Dlog p 
equal to |log \. 



log 4 is equivalent to setting the parameter 9 

□ 



It is well known for a multidimensional contingency table that marginal 
distributions are variation independent from conditional odds ratios. Prop- 
erly selected conditional odds ratios and sets of marginal distributions deter- 



mine the distribution of the table uniquely ( Barndorff-Nielsen 1976 Rudas 



1998 Bergsma & Rudas, 2003). A generalization of this fact to the set X is 



given in the following theorem. 

Theorem 4.2. Let V he the set of all positive distributions on the table X. 
Suppose A is a non-negative integer matrix of full row rank and T) is a kernel 
basis matrix of A. Then the following statements hold: 

(i) For any Ps^, Ps.2 ^ 'P there exists a distribution Ps eV and a scalar a 
such that 

A6 = aA^i and Dlog 6 = Dlog 62- 
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(a) The coefficient of proportionality a = 1 if and only if 1 & R{A). 
The proof is straightforward, by Corollaries 3^ and 3^ and is omitted 



here. 



5 Applications 

The first example features relational models as a potential tool for modeling 
social mobility tables. A model of independence is considered on a space that 
is not the Cartesian product of the domains of the variables in the table. 

Example 5.1. Social mobility tables often express a relation between sta- 
tuses of two generations, for example, the relation between occupational sta- 
tuses of respondents and their fathers, as in Table |7| ( |Blau fc Dun"HaEl peT) ) 



To test the hypothesis of independence between respondent's mobility and 
father's status, consider the Respondent's mobility variable with three cat- 
egories: Upward mobile (moving up compared to father's status). Immobile 
(staying at the same status), and Downward mobile (moving down compared 
to father's status). The initial table is thence transformed into Table IS} 



Table 7: Occupational Changes in a Generation, 1962 



Father's occupation 


Respondent's occupation 


White-collar 


Manual 


Farm 


White-collar 


6313 


2644 


132 


Manual 


6321 


10883 


294 


Farm 


2495 


6124 


2471 



Table 8: Father's occupation vs Respondent's mobility. The MLEs are shown 
in parentheses. 



Father's occupation 


Respondent's mobility 


Upward 


Immobile 


Downward 


White-collar 




6313 (7518.17) 


2776 (1570.83) 


Manual 


6321 (8823.66) 


10883 (7175.18) 


294 (1499.17) 


Farm 


8619 (6116.34) 


2471 (4973.66) 
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Since respondents cannot move up from the highest status or down from 
the lowest status, then the cells (1, 1) and (3, 3) in Table [s] do not exist. The 
set of cells X is a proper subset of the Cartesian product of the domains of 
the variables in the table. Let S be the class consisting of the cylindrical 
sets associated with the marginals, including the empty one. The relational 
model generated by S has the model matrix 



A 



1 
1 


1 



and is expressed in terms of local odds ratios as follows: 

P12P23 , P21P32 



1, 



1. 



P13P22 P22P31 

This model is a regular exponential family of order 4; the maximum likelihood 
estimates of cell frequencies exist and are unique. (The estimates are shown 
in Table [s] next to the observed values.) The observed = 6995.83 on 
two degrees of freedom provides an evidence of strong association between 
father's occupation and respondent's mobility. □ 

The next example illustrates the usefulness of relational models for net- 
work analysis. 

Example 5.2. Table |9] shows the total trade data between seven European 



countries that was collected fYom \United Nations Commodity Trade Statistics 
Database (2007). Every cell contains the value of trade volume for a pair of 
countries; cell counts are assumed to have Poisson distribution. The two 
hypotheses of interest are: countries with larger economies generate more 
trade, and trade volume between two countries is higher if they use the same 
currency. In this example, GDP (gross domestic product) is is chosen as the 
characteristic of economy and Eurozone membership is chosen as the common 
currency indicator. The class S includes five subsets of cells reflecting the 
GDP size: 

{GDP < 0.1 ■ 10^ vs GDP < 0.1 ■ 10^}, 

{GDP < 0.1 ■ 10^ vs 0.1 ■ 10^ < GDP < 0.6 ■ 10^}, 

{GDP < 0.1 ■ 10^ vs GDP > 0.6 • 10*^}, 

{0.1 ■ 10*^ < GDP < 0.6 ■ 10^ vs 0.1 ■ 10^ < GDP < 0.6 • 10^}, 
{0.1 ■ 10*^ < GDP < 0.6 • 10^ vs GDP > 0.6 ■ 10^}, 
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and three subsets reflecting Eurozone membership: 

{cells, showing trade between two Eurozone members }, 

{cells, showing trade between a Eurozone member and a non-member }, 

{cells, showing trade between two Eurozone non- members}. 

Under the model generated by S, trade volume is the product of the GDP 
effect and the Eurozone membership effect. 



Table 9: Total trade between seven countries (in biUions US dollars). The 
MLEs are shown in parentheses. 





LV NLD 


FIN 


EST 


SWE 


BEL 


LUX 


LV 


[0] 0.7 (3.29) 


1 (L17) 


2 (2.0) 


1.3 (1.17) 


0.4 (1.17) 


0.01 (0.01) 


NLD 


[0] 


10 (17) 


1 (1.17) 


17 (15) 


102 (102) 


2.1 (2.29) 


FIN 




[0] 


4 (1.17) 


18 (15) 


4 (2.29) 


0.1 (2.29) 


EST 






[0] 


2.6 (1.17) 


0.5 (1.17) 


0.01 (0.01) 


SWE 








[0] 


15 (15) 


0.35 (2.29) 


BEL 










[0] 


9 (6.41) 


LUX 












[0] 



This model is a regular exponential family of order 6. The maximum 
likelihood estimates for cell frequencies exist and are unique. The observed 
X"^ = 20.16 on 14 degrees of freedom yields the asymptotic p-value of 0.12; 
so the model flts the trade data well. Alternatively, sensitivity of the model 
flt to other choices regarding GDP could also be studied. □ 
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